Better and Simpler Error Analysis of the Sinkhorn-Knopp Algorithm for Matrix Scaling
نویسندگان
چکیده
Given a non-negative n × m real matrix A, the matrix scaling problem is to determine if it is possible to scale the rows and columns so that each row and each column sums to a specified target value for it. The matrix scaling problem arises in many algorithmic applications, perhaps most notably as a preconditioning step in solving linear system of equations. One of the most natural and by now classical approach to matrix scaling is the Sinkhorn-Knopp algorithm (also known as the RAS method) where one alternately scales either all rows or all columns to meet the target values. In addition to being extremely simple and natural, another appeal of this procedure is that it easily lends itself to parallelization. A central question is to understand the rate of convergence of the Sinkhorn-Knopp algorithm. Specifically, given a suitable error metric to measure deviations from target values, and an error bound ε, how quickly does the Sinkhorn-Knopp algorithm converge to an error below ε? While there are several non-trivial convergence results known about the Sinkhorn-Knopp algorithm, perhaps somewhat surprisingly, even for natural error metrics such as l1-error or l2-error, this is not entirely understood. In this paper, we present an elementary convergence analysis for the Sinkhorn-Knopp algorithm that improves upon the previous best bound. In a nutshell, our approach is to show (i) a simple bound on the number of iterations needed so that the KL-divergence between the current row-sums and the target row-sums drops below a specified threshold δ, and (ii) then show that for a suitable choice of δ, whenever KL-divergence is below δ, then the l1-error or the l2-error is below ε. The well-known Pinsker’s inequality immediately allows us to translate a bound on the KL divergence to a bound on l1-error. To bound the l2-error in terms of the KL-divergence, we establish a new inequality, referred to as (KL vs l1/l2). This new inequality is a strengthening of the Pinsker’s inequality that we believe is of independent interest. Our analysis of l2-error significantly improves upon the best previous convergence bound for l2-error. The idea of studying Sinkhorn-Knopp convergence via KL-divergence is not new and has indeed been previously explored. Our contribution is an elementary, self-contained presentation of this approach and an interesting new inequality that yields a significantly stronger convergence guarantee for the extensively studied l2-error.
منابع مشابه
The Sinkhorn-Knopp Algorithm: Convergence and Applications
As long as a square nonnegative matrix A contains sufficient nonzero elements, then the Sinkhorn-Knopp algorithm can be used to balance the matrix, that is, to find a diagonal scaling of A that is doubly stochastic. It is known that the convergence is linear and an upper bound has been given for the rate of convergence for positive matrices. In this paper we give an explicit expression for the ...
متن کاملA Fast Algorithm for Matrix Balancing
As long as a square nonnegative matrix A contains sufficient nonzero elements, then the matrix can be balanced, that is we can find a diagonal scaling of A that is doubly stochastic. A number of algorithms have been proposed to achieve the balancing, the most well known of these being the Sinkhorn-Knopp algorithm. In this paper we derive new algorithms based on inner-outer iteration schemes. We...
متن کاملDemystifying Symmetric Smoothing Filters
Many patch-based image denoising algorithms can be formulated as applying a smoothing filter to the noisy image. Expressed as matrices, the smoothing filters must be row normalized so that each row sums to unity. Surprisingly, if we apply a column normalization before the row normalization, the performance of the smoothing filter can often be significantly improved. Prior works showed that such...
متن کاملStabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems
Scaling algorithms for entropic transport-type problems have become a very popular numerical method, encompassing Wasserstein barycenters, multi-marginal problems, gradient flows and unbalanced transport. However, a standard implementation of the scaling algorithm has several numerical limitations: the scaling factors diverge and convergence becomes impractically slow as the entropy regularizat...
متن کاملRegularized Optimal Transport and the Rot Mover's Distance
This paper presents a unified framework for smooth convex regularization of discrete optimal transport problems. In this context, the regularized optimal transport turns out to be equivalent to a matrix nearness problem with respect to Bregman divergences. Our framework thus naturally generalizes a previously proposed regularization based on the Boltzmann-Shannon entropy related to the Kullback...
متن کامل